Defintion of Tasks:
-send the data – Audrey for today - translate sizes into measurable area – Shaji - Tuesday - write R code – Daniel – Wednesday / Thursday -conclusions & analysis – Ross / Thursday / Friday - organise another meeting on Friday - Audrey
Overview
Subject matter: Regression
With focus to look at:
Target variable: quantities Exploratory variables: month (translated into scale to define ‘coldness’ from 1 to 4 depending on season) type of scarf (weight soft Boolean), price (Actualsalesamt_central / quantity), size (calculate the area based on sizegrouping accessories
Shaji:
I’ve completed the tasks we agreed upon, uploaded the file with a “_cleansed” added to the file name.
Following is what I have done.
Added 4 new columns, cal_month_DF actualsalesamt_central_DF weight_softs_DF sizegroupings_accessories_DF
Here are the values.
cal_month_DF Winter 12,1,2 = 1 Spring 3,4,5 = 2 Summer 6,7,8 = 3 Autumn 9,10,11 = 4
actualsalesamt_central_DF = actualsalesamt_central/quantity
weight_softs_DF Heavy = 1 Light = 2
sizegroupings_accessories_DF Area of the scarf calculated and numeric weighting given as follows.
Size Area Weighting 110 X 6 CM 660 1 130 X 6.5 CM 845 2 175 X 8 CM 1400 3 50 X 50 CM 2500 4 130 X 20 CM 2600 5 120 X 25 CM 3000 6 170 X 20 CM 3400 7 60 X 60 CM 3600 8 155 X 30 CM 4650 9 200 X 25 CM 5000 10 168 X 30 CM 5040 11 180 X 30 CM 5400 12 230 X 25 CM 5750 13 200 X 30 CM 6000 14 80 X 80 CM 6400 15 180 X 36 CM 6480 16 180 X 37 CM 6660 17 230 X 30 CM 6900 18 200 X 36 CM 7200 19 250 X 30 CM 7500 20 90 X 90 CM 8100 21 200 X 45 CM 9000 22 98 X 98 CM 9604 23 200 X 50 CM 10000 24 200 X 60 CM 12000 25 180 X 70 CM 12600 26 190 X 70 CM 13300 27 164 X 82 CM 13448 28 200 X 70 CM 14000 29 167 X 85 CM 14195 30 120 X 120 CM 14400 31 220 X 70 CM 15400 32 180 X 90 CM 16200 33 175 X 100 CM 17500 34 200 X 90 CM 18000 35 140 X 140 CM 19600 36 200 X 100 CM 20000 37 145 X 145 CM 21025 38 270 X 90 CM 24300 39 200 X 140 CM 28000 40
We are working to examine the relationship between the purchasing of scarves at Burberry and several different variables, we will apply our learnings in regression to improve our understanding of the correlation between variables.
install.packages("pROC")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/pROC_1.13.0.tgz'
Content type 'application/x-gzip' length 845216 bytes (825 KB)
==================================================
downloaded 825 KB
The downloaded binary packages are in
/var/folders/gs/pcrbzxsn7qjb5_fsh74q_zqsz6sl1g/T//Rtmp0KmaPx/downloaded_packages
install.packages("caTools")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/caTools_1.17.1.1.tgz'
Content type 'application/x-gzip' length 243318 bytes (237 KB)
==================================================
downloaded 237 KB
The downloaded binary packages are in
/var/folders/gs/pcrbzxsn7qjb5_fsh74q_zqsz6sl1g/T//Rtmp0KmaPx/downloaded_packages
install.packages("tidyverse")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/tidyverse_1.2.1.tgz'
Content type 'application/x-gzip' length 88754 bytes (86 KB)
==================================================
downloaded 86 KB
The downloaded binary packages are in
/var/folders/gs/pcrbzxsn7qjb5_fsh74q_zqsz6sl1g/T//Rtmp0KmaPx/downloaded_packages
library(plyr)
library(pROC)
Type 'citation("pROC")' for a citation.
Attaching package: ‘pROC’
The following objects are masked from ‘package:stats’:
cov, smooth, var
library(caTools)
library(ggplot2)
library(plotly)
Attaching package: ‘plotly’
The following object is masked from ‘package:ggplot2’:
last_plot
The following objects are masked from ‘package:plyr’:
arrange, mutate, rename, summarise
The following object is masked from ‘package:stats’:
filter
The following object is masked from ‘package:graphics’:
layout
library(tidyverse)
[30m── [1mAttaching packages[22m ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──[39m
[30m[32m✔[30m [34mtibble [30m 2.0.1 [32m✔[30m [34mpurrr [30m 0.2.5
[32m✔[30m [34mtidyr [30m 0.8.1 [32m✔[30m [34mdplyr [30m 0.7.8
[32m✔[30m [34mreadr [30m 1.1.1 [32m✔[30m [34mstringr[30m 1.3.1
[32m✔[30m [34mtibble [30m 2.0.1 [32m✔[30m [34mforcats[30m 0.3.0[39m
package ‘tibble’ was built under R version 3.5.2[30m── [1mConflicts[22m ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[30m [34mdplyr[30m::[32marrange()[30m masks [34mplotly[30m::arrange(), [34mplyr[30m::arrange()
[31m✖[30m [34mpurrr[30m::[32mcompact()[30m masks [34mplyr[30m::compact()
[31m✖[30m [34mdplyr[30m::[32mcount()[30m masks [34mplyr[30m::count()
[31m✖[30m [34mdplyr[30m::[32mfailwith()[30m masks [34mplyr[30m::failwith()
[31m✖[30m [34mdplyr[30m::[32mfilter()[30m masks [34mplotly[30m::filter(), [34mstats[30m::filter()
[31m✖[30m [34mdplyr[30m::[32mid()[30m masks [34mplyr[30m::id()
[31m✖[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()
[31m✖[30m [34mdplyr[30m::[32mmutate()[30m masks [34mplotly[30m::mutate(), [34mplyr[30m::mutate()
[31m✖[30m [34mdplyr[30m::[32mrename()[30m masks [34mplotly[30m::rename(), [34mplyr[30m::rename()
[31m✖[30m [34mdplyr[30m::[32msummarise()[30m masks [34mplotly[30m::summarise(), [34mplyr[30m::summarise()
[31m✖[30m [34mdplyr[30m::[32msummarize()[30m masks [34mplyr[30m::summarize()[39m
install.packages("corrplot")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/corrplot_0.84.tgz'
Content type 'application/x-gzip' length 5451964 bytes (5.2 MB)
==================================================
downloaded 5.2 MB
The downloaded binary packages are in
/var/folders/gs/pcrbzxsn7qjb5_fsh74q_zqsz6sl1g/T//Rtmp0KmaPx/downloaded_packages
library(corrplot)
corrplot 0.84 loaded
install.packages("readxl")
Error in install.packages : Updating loaded packages
library(readxl)
install.packages("tibble")
Error in install.packages : Updating loaded packages
install.packages("dplyr")
Error in install.packages : Updating loaded packages
library(dplyr)
Import the main data to dataset variable and a summary statistics (sheet 2) imported into another dataset.
dataset <- read_xlsx("DF - regression_cleansed.xlsx", sheet = 1) # Raw sales data
-
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
Expecting numeric in AN2077 / R2077C40: got 'LARGE'
-
\
|
/
-
\
|
/
Expecting numeric in AN29273 / R29273C40: got 'LARGE'
-
\
|
/
-
\
Expecting numeric in AK50927 / R50927C37: got 'MID'Expecting numeric in AN51348 / R51348C40: got 'SMALL'
|
Expecting numeric in AN52714 / R52714C40: got 'SMALL'Expecting numeric in AK53564 / R53564C37: got 'MID'Expecting numeric in AK53821 / R53821C37: got 'MID'Expecting numeric in AN54041 / R54041C40: got 'SMALL'Expecting numeric in AK54063 / R54063C37: got 'MID'Expecting numeric in AK54301 / R54301C37: got 'MID'Expecting numeric in AK54429 / R54429C37: got 'MID'Expecting numeric in AK54430 / R54430C37: got 'MID'Expecting numeric in AN55260 / R55260C40: got 'SMALL'
/
Expecting numeric in AN57012 / R57012C40: got 'LARGE'
-
\
|
/
Expecting numeric in AN71255 / R71255C40: got 'SMALL'Expecting numeric in AN71627 / R71627C40: got 'LARGE'Expecting numeric in AN72368 / R72368C40: got 'LARGE'
-
\
Expecting numeric in AK77883 / R77883C37: got 'MID'Expecting numeric in AK79008 / R79008C37: got 'MID'Expecting numeric in AN79392 / R79392C40: got 'LARGE'Expecting numeric in AN79393 / R79393C40: got 'LARGE'Expecting numeric in AK79959 / R79959C37: got 'MID'Expecting numeric in AK79966 / R79966C37: got 'MID'
|
Expecting numeric in AK80289 / R80289C37: got 'MID'Expecting numeric in AN81083 / R81083C40: got 'LARGE'Expecting numeric in AK81211 / R81211C37: got 'MID'Expecting numeric in AN81418 / R81418C40: got 'LARGE'Expecting numeric in AN81437 / R81437C40: got 'LARGE'Expecting numeric in AK81867 / R81867C37: got 'MID'Expecting numeric in AN82033 / R82033C40: got 'SMALL'Expecting numeric in AK82314 / R82314C37: got 'MID'Expecting numeric in AK82434 / R82434C37: got 'MID'Expecting numeric in AK83019 / R83019C37: got 'MID'Expecting numeric in AN83266 / R83266C40: got 'LARGE'
/
Expecting numeric in AK83746 / R83746C37: got 'MID'Expecting numeric in AN84268 / R84268C40: got 'LARGE'Expecting numeric in AK84587 / R84587C37: got 'MID'Expecting numeric in AK84857 / R84857C37: got 'MID'Expecting numeric in AK84942 / R84942C37: got 'MID'Expecting numeric in AN85233 / R85233C40: got 'SMALL'Expecting numeric in AN85332 / R85332C40: got 'SMALL'Expecting numeric in AN85334 / R85334C40: got 'SMALL'Expecting numeric in AN85470 / R85470C40: got 'LARGE'Expecting numeric in AN85486 / R85486C40: got 'SMALL'Expecting numeric in AK85701 / R85701C37: got 'MID'Expecting numeric in AK85703 / R85703C37: got 'MID'Expecting numeric in AN85824 / R85824C40: got 'SMALL'Expecting numeric in AN86078 / R86078C40: got 'LARGE'Expecting numeric in AK86219 / R86219C37: got 'MID'Expecting numeric in AK86404 / R86404C37: got 'MID'Expecting numeric in AN86423 / R86423C40: got 'SMALL'Expecting numeric in AN86458 / R86458C40: got 'LARGE'Expecting numeric in AK86548 / R86548C37: got 'MID'Expecting numeric in AK86553 / R86553C37: got 'MID'Expecting numeric in AN86567 / R86567C40: got 'SMALL'Expecting numeric in AN86610 / R86610C40: got 'LARGE'Expecting numeric in AN86997 / R86997C40: got 'LARGE'Expecting numeric in AK87057 / R87057C37: got 'MID'Expecting numeric in AK87058 / R87058C37: got 'MID'
-
Expecting numeric in AN87117 / R87117C40: got 'LARGE'Expecting numeric in AK87118 / R87118C37: got 'MID'Expecting numeric in AK87144 / R87144C37: got 'MID'Expecting numeric in AN87147 / R87147C40: got 'LARGE'Expecting numeric in AN87280 / R87280C40: got 'LARGE'Expecting numeric in AK87478 / R87478C37: got 'MID'Expecting numeric in AK87552 / R87552C37: got 'MID'Expecting numeric in AN87592 / R87592C40: got 'LARGE'Expecting numeric in AK87719 / R87719C37: got 'MID'Expecting numeric in AK87727 / R87727C37: got 'MID'Expecting numeric in AK87837 / R87837C37: got 'MID'Expecting numeric in AN87852 / R87852C40: got 'SMALL'Expecting numeric in AN88076 / R88076C40: got 'LARGE'Expecting numeric in AN88289 / R88289C40: got 'LARGE'Expecting numeric in AK88355 / R88355C37: got 'MID'Expecting numeric in AK88420 / R88420C37: got 'MID'Expecting numeric in AK88424 / R88424C37: got 'MID'Expecting numeric in AN88650 / R88650C40: got 'SMALL'Expecting numeric in AK88663 / R88663C37: got 'MID'Expecting numeric in AN88856 / R88856C40: got 'LARGE'Expecting numeric in AN89042 / R89042C40: got 'LARGE'Expecting numeric in AN89115 / R89115C40: got 'SMALL'Expecting numeric in AK89273 / R89273C37: got 'MID'Expecting numeric in AK89291 / R89291C37: got 'MID'Expecting numeric in AK89336 / R89336C37: got 'MID'Expecting numeric in AK89461 / R89461C37: got 'MID'Expecting numeric in AK89484 / R89484C37: got 'MID'Expecting numeric in AK89558 / R89558C37: got 'MID'Expecting numeric in AN89715 / R89715C40: got 'LARGE'Expecting numeric in AN89778 / R89778C40: got 'SMALL'Expecting numeric in AK89804 / R89804C37: got 'MID'Expecting numeric in AK89890 / R89890C37: got 'MID'Expecting numeric in AN90019 / R90019C40: got 'SMALL'Expecting numeric in AN90220 / R90220C40: got 'LARGE'Expecting numeric in AN90221 / R90221C40: got 'LARGE'Expecting numeric in AN90378 / R90378C40: got 'LARGE'Expecting numeric in AN90441 / R90441C40: got 'LARGE'Expecting numeric in AK90463 / R90463C37: got 'MID'Expecting numeric in AK90495 / R90495C37: got 'MID'Expecting numeric in AK90524 / R90524C37: got 'MID'
\
Expecting numeric in AK90597 / R90597C37: got 'MID'Expecting numeric in AK90734 / R90734C37: got 'MID'Expecting numeric in AN90966 / R90966C40: got 'SMALL'Expecting numeric in AK91433 / R91433C37: got 'MID'Expecting numeric in AN91445 / R91445C40: got 'LARGE'Expecting numeric in AK91450 / R91450C37: got 'MID'Expecting numeric in AN91618 / R91618C40: got 'LARGE'Expecting numeric in AK91901 / R91901C37: got 'MID'Expecting numeric in AN92062 / R92062C40: got 'SMALL'Expecting numeric in AK92195 / R92195C37: got 'MID'Expecting numeric in AK92263 / R92263C37: got 'MID'Expecting numeric in AK92268 / R92268C37: got 'MID'Expecting numeric in AK92429 / R92429C37: got 'MID'Expecting numeric in AK92431 / R92431C37: got 'MID'Expecting numeric in AK92603 / R92603C37: got 'MID'Expecting numeric in AN92678 / R92678C40: got 'SMALL'Expecting numeric in AK92783 / R92783C37: got 'MID'Expecting numeric in AN92934 / R92934C40: got 'LARGE'Expecting numeric in AN92943 / R92943C40: got 'LARGE'Expecting numeric in AN92993 / R92993C40: got 'LARGE'Expecting numeric in AN93011 / R93011C40: got 'LARGE'Expecting numeric in AK93293 / R93293C37: got 'MID'Expecting numeric in AK93625 / R93625C37: got 'MID'Expecting numeric in AN93748 / R93748C40: got 'LARGE'Expecting numeric in AN93754 / R93754C40: got 'SMALL'Expecting numeric in AK93816 / R93816C37: got 'MID'Expecting numeric in AN93830 / R93830C40: got 'LARGE'Expecting numeric in AK93880 / R93880C37: got 'MID'Expecting numeric in AK94054 / R94054C37: got 'MID'
|
Expecting numeric in AK94071 / R94071C37: got 'MID'Expecting numeric in AK94077 / R94077C37: got 'MID'Expecting numeric in AK94079 / R94079C37: got 'MID'Expecting numeric in AN94113 / R94113C40: got 'SMALL'Expecting numeric in AN94118 / R94118C40: got 'LARGE'Expecting numeric in AN94251 / R94251C40: got 'LARGE'Expecting numeric in AK94432 / R94432C37: got 'MID'Expecting numeric in AN94620 / R94620C40: got 'LARGE'Expecting numeric in AN94637 / R94637C40: got 'SMALL'Expecting numeric in AN94643 / R94643C40: got 'SMALL'Expecting numeric in AK94766 / R94766C37: got 'MID'Expecting numeric in AK94935 / R94935C37: got 'MID'Expecting numeric in AK94963 / R94963C37: got 'MID'Expecting numeric in AK95020 / R95020C37: got 'MID'Expecting numeric in AK95126 / R95126C37: got 'MID'Expecting numeric in AN95215 / R95215C40: got 'LARGE'Expecting numeric in AN95619 / R95619C40: got 'SMALL'Expecting numeric in AK95746 / R95746C37: got 'MID'Expecting numeric in AN95756 / R95756C40: got 'LARGE'Expecting numeric in AN95927 / R95927C40: got 'LARGE'Expecting numeric in AN95928 / R95928C40: got 'LARGE'Expecting numeric in AN95979 / R95979C40: got 'LARGE'Expecting numeric in AK96049 / R96049C37: got 'MID'Expecting numeric in AN96064 / R96064C40: got 'LARGE'Expecting numeric in AN96148 / R96148C40: got 'LARGE'Expecting numeric in AN96153 / R96153C40: got 'LARGE'Expecting numeric in AN96167 / R96167C40: got 'LARGE'Expecting numeric in AN96274 / R96274C40: got 'LARGE'Expecting numeric in AN96301 / R96301C40: got 'SMALL'Expecting numeric in AN96495 / R96495C40: got 'LARGE'Expecting numeric in AK96644 / R96644C37: got 'MID'Expecting numeric in AK96685 / R96685C37: got 'MID'Expecting numeric in AN96739 / R96739C40: got 'SMALL'Expecting numeric in AN96741 / R96741C40: got 'LARGE'Expecting numeric in AK97083 / R97083C37: got 'MID'Expecting numeric in AN97088 / R97088C40: got 'LARGE'Expecting numeric in AN97161 / R97161C40: got 'LARGE'Expecting numeric in AN97206 / R97206C40: got 'SMALL'Expecting numeric in AN97507 / R97507C40: got 'SMALL'
/
Expecting numeric in AN97681 / R97681C40: got 'LARGE'Expecting numeric in AN97733 / R97733C40: got 'LARGE'Expecting numeric in AN97819 / R97819C40: got 'LARGE'Expecting numeric in AK97915 / R97915C37: got 'MID'Expecting numeric in AN97918 / R97918C40: got 'SMALL'Expecting numeric in AN97937 / R97937C40: got 'LARGE'Expecting numeric in AK98083 / R98083C37: got 'MID'Expecting numeric in AK98084 / R98084C37: got 'MID'Expecting numeric in AK98087 / R98087C37: got 'MID'Expecting numeric in AN98104 / R98104C40: got 'SMALL'Expecting numeric in AN98154 / R98154C40: got 'SMALL'Expecting numeric in AK98332 / R98332C37: got 'MID'Expecting numeric in AN98514 / R98514C40: got 'LARGE'Expecting numeric in AK98903 / R98903C37: got 'MID'Expecting numeric in AN99320 / R99320C40: got 'LARGE'Expecting numeric in AN99674 / R99674C40: got 'LARGE'Expecting numeric in AK99707 / R99707C37: got 'MID'Expecting numeric in AN99727 / R99727C40: got 'SMALL'Expecting numeric in AK99874 / R99874C37: got 'MID'Expecting numeric in AN100125 / R100125C40: got 'LARGE'Expecting numeric in AK100129 / R100129C37: got 'MID'Expecting numeric in AK100182 / R100182C37: got 'MID'Expecting numeric in AK100243 / R100243C37: got 'MID'Expecting numeric in AN100249 / R100249C40: got 'LARGE'Expecting numeric in AK100619 / R100619C37: got 'MID'Expecting numeric in AN100898 / R100898C40: got 'LARGE'
-
Expecting numeric in AK101029 / R101029C37: got 'MID'Expecting numeric in AN101083 / R101083C40: got 'LARGE'Expecting numeric in AN101096 / R101096C40: got 'SMALL'Expecting numeric in AN101157 / R101157C40: got 'SMALL'Expecting numeric in AN101182 / R101182C40: got 'SMALL'Expecting numeric in AK101315 / R101315C37: got 'MID'Expecting numeric in AK101522 / R101522C37: got 'MID'Expecting numeric in AN101839 / R101839C40: got 'LARGE'Expecting numeric in AK101950 / R101950C37: got 'MID'Expecting numeric in AN102190 / R102190C40: got 'LARGE'Expecting numeric in AN102199 / R102199C40: got 'SMALL'Expecting numeric in AN102259 / R102259C40: got 'LARGE'Expecting numeric in AN102327 / R102327C40: got 'SMALL'Expecting numeric in AK102543 / R102543C37: got 'MID'Expecting numeric in AN102621 / R102621C40: got 'SMALL'Expecting numeric in AK102632 / R102632C37: got 'MID'Expecting numeric in AK102931 / R102931C37: got 'MID'Expecting numeric in AK103004 / R103004C37: got 'MID'Expecting numeric in AN103024 / R103024C40: got 'SMALL'Expecting numeric in AK103164 / R103164C37: got 'MID'Expecting numeric in AK103174 / R103174C37: got 'MID'Expecting numeric in AK103348 / R103348C37: got 'MID'Expecting numeric in AN103640 / R103640C40: got 'LARGE'Expecting numeric in AN103712 / R103712C40: got 'LARGE'Expecting numeric in AN103804 / R103804C40: got 'LARGE'Expecting numeric in AN103838 / R103838C40: got 'LARGE'Expecting numeric in AK104086 / R104086C37: got 'MID'Expecting numeric in AK104217 / R104217C37: got 'MID'Expecting numeric in AN104428 / R104428C40: got 'LARGE'
\
Expecting numeric in AN104496 / R104496C40: got 'LARGE'Expecting numeric in AN104553 / R104553C40: got 'LARGE'Expecting numeric in AK104655 / R104655C37: got 'MID'Expecting numeric in AN104958 / R104958C40: got 'LARGE'Expecting numeric in AK104967 / R104967C37: got 'MID'Expecting numeric in AK105027 / R105027C37: got 'MID'Expecting numeric in AK105284 / R105284C37: got 'MID'Expecting numeric in AK105288 / R105288C37: got 'MID'Expecting numeric in AK105542 / R105542C37: got 'MID'Expecting numeric in AK105543 / R105543C37: got 'MID'Expecting numeric in AK105586 / R105586C37: got 'MID'Expecting numeric in AK105759 / R105759C37: got 'MID'Expecting numeric in AN105788 / R105788C40: got 'SMALL'Expecting numeric in AK105789 / R105789C37: got 'MID'Expecting numeric in AN105847 / R105847C40: got 'LARGE'Expecting numeric in AN105909 / R105909C40: got 'LARGE'Expecting numeric in AK105978 / R105978C37: got 'MID'Expecting numeric in AN106222 / R106222C40: got 'LARGE'Expecting numeric in AK106248 / R106248C37: got 'MID'Expecting numeric in AK106317 / R106317C37: got 'MID'Expecting numeric in AN106761 / R106761C40: got 'LARGE'Expecting numeric in AN106762 / R106762C40: got 'LARGE'Expecting numeric in AK106932 / R106932C37: got 'MID'Expecting numeric in AN107163 / R107163C40: got 'LARGE'Expecting numeric in AN107247 / R107247C40: got 'LARGE'Expecting numeric in AN107454 / R107454C40: got 'SMALL'Expecting numeric in AN107458 / R107458C40: got 'LARGE'Expecting numeric in AN107623 / R107623C40: got 'LARGE'Expecting numeric in AN107783 / R107783C40: got 'SMALL'Expecting numeric in AN107827 / R107827C40: got 'LARGE'Expecting numeric in AN107913 / R107913C40: got 'LARGE'
|
Expecting numeric in AN108061 / R108061C40: got 'LARGE'Expecting numeric in AK108130 / R108130C37: got 'MID'Expecting numeric in AN108633 / R108633C40: got 'LARGE'Expecting numeric in AN108781 / R108781C40: got 'SMALL'Expecting numeric in AN108785 / R108785C40: got 'LARGE'Expecting numeric in AN108874 / R108874C40: got 'LARGE'Expecting numeric in AK108958 / R108958C37: got 'MID'Expecting numeric in AN109058 / R109058C40: got 'SMALL'Expecting numeric in AN109259 / R109259C40: got 'LARGE'Expecting numeric in AN109461 / R109461C40: got 'LARGE'Expecting numeric in AK109587 / R109587C37: got 'MID'Expecting numeric in AK109679 / R109679C37: got 'MID'Expecting numeric in AK109681 / R109681C37: got 'MID'Expecting numeric in AK109965 / R109965C37: got 'MID'Expecting numeric in AN110040 / R110040C40: got 'LARGE'Expecting numeric in AN110082 / R110082C40: got 'SMALL'Expecting numeric in AK110100 / R110100C37: got 'MID'Expecting numeric in AK110303 / R110303C37: got 'MID'Expecting numeric in AN110802 / R110802C40: got 'LARGE'Expecting numeric in AK110858 / R110858C37: got 'MID'Expecting numeric in AN110952 / R110952C40: got 'LARGE'
/
Expecting numeric in AK111603 / R111603C37: got 'MID'Expecting numeric in AN111622 / R111622C40: got 'LARGE'Expecting numeric in AN111807 / R111807C40: got 'SMALL'Expecting numeric in AK111892 / R111892C37: got 'MID'Expecting numeric in AN111893 / R111893C40: got 'SMALL'Expecting numeric in AN112362 / R112362C40: got 'LARGE'Expecting numeric in AN112648 / R112648C40: got 'SMALL'Expecting numeric in AN112897 / R112897C40: got 'LARGE'Expecting numeric in AN112954 / R112954C40: got 'LARGE'Expecting numeric in AK112966 / R112966C37: got 'MID'Expecting numeric in AK112983 / R112983C37: got 'MID'Expecting numeric in AK113400 / R113400C37: got 'MID'Expecting numeric in AK113446 / R113446C37: got 'MID'Expecting numeric in AN114030 / R114030C40: got 'LARGE'Expecting numeric in AN114180 / R114180C40: got 'LARGE'Expecting numeric in AN114195 / R114195C40: got 'SMALL'Expecting numeric in AN114390 / R114390C40: got 'LARGE'Expecting numeric in AN114561 / R114561C40: got 'LARGE'Expecting numeric in AK114692 / R114692C37: got 'MID'Expecting numeric in AN114919 / R114919C40: got 'LARGE'Expecting numeric in AK114950 / R114950C37: got 'MID'Expecting numeric in AN114968 / R114968C40: got 'LARGE'Expecting numeric in AN114969 / R114969C40: got 'LARGE'
-
Expecting numeric in AN114999 / R114999C40: got 'LARGE'Expecting numeric in AN115336 / R115336C40: got 'SMALL'Expecting numeric in AK115582 / R115582C37: got 'MID'Expecting numeric in AK115666 / R115666C37: got 'MID'Expecting numeric in AK116011 / R116011C37: got 'MID'Expecting numeric in AN116084 / R116084C40: got 'SMALL'Expecting numeric in AK116690 / R116690C37: got 'MID'Expecting numeric in AK116726 / R116726C37: got 'MID'Expecting numeric in AN116790 / R116790C40: got 'SMALL'Expecting numeric in AK116835 / R116835C37: got 'MID'
\
dataset2 <- read_xlsx("DF - regression_cleansed.xlsx", sheet = 2) # statistical sales data
-
/
Show summaries of information of the Dataset to ensure data has been alocated to a dataframe.
summary(dataset2)
Month DF_month Count of Sales Total Sales by Month Weight 1 Weight 2 M F Local
Min. : 1.00 Min. :1.00 Min. : 6921 Min. :2097201 Min. : 2341 Min. :1715 Min. :1203 Min. :2219 Min. : 915
1st Qu.: 3.75 1st Qu.:1.75 1st Qu.: 7563 1st Qu.:2287530 1st Qu.: 3308 1st Qu.:2895 1st Qu.:1372 1st Qu.:3097 1st Qu.:1238
Median : 6.50 Median :2.50 Median : 9464 Median :2947630 Median : 4916 Median :4460 Median :1899 Median :3503 Median :1894
Mean : 6.50 Mean :2.50 Mean : 9741 Mean :3091731 Mean : 5532 Mean :3909 Mean :1933 Mean :3660 Mean :2031
3rd Qu.: 9.25 3rd Qu.:3.25 3rd Qu.:11269 3rd Qu.:3704559 3rd Qu.: 6929 3rd Qu.:4667 3rd Qu.:2252 3rd Qu.:4131 3rd Qu.:2389
Max. :12.00 Max. :4.00 Max. :14494 Max. :4777180 Max. :10520 Max. :5169 Max. :3080 Max. :5616 Max. :5064
Tourist APAC MIDDLE EAST EUROPE AMERICAS REST OF EMEIA
Min. :2204 Min. :1493 Min. :219.0 Min. : 900 Min. :251.0 Min. :175.0
1st Qu.:2967 1st Qu.:1840 1st Qu.:273.0 1st Qu.:1253 1st Qu.:458.8 1st Qu.:202.0
Median :3257 Median :2110 Median :288.5 Median :1838 Median :532.0 Median :232.0
Mean :3555 Mean :2446 Mean :329.8 Mean :2009 Mean :531.4 Mean :267.4
3rd Qu.:3820 3rd Qu.:2924 3rd Qu.:348.0 3rd Qu.:2449 3rd Qu.:572.0 3rd Qu.:323.0
Max. :5911 Max. :4062 Max. :642.0 Max. :4794 Max. :937.0 Max. :414.0
str(dataset2)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 12 obs. of 15 variables:
$ Month : num 11 12 10 9 8 7 6 5 4 3 ...
$ DF_month : num 4 1 4 4 3 3 3 2 2 2 ...
$ Count of Sales : num 11024 13887 14494 12005 10044 ...
$ Total Sales by Month: num 3664084 4542694 4777180 3825982 3122719 ...
$ Weight 1 : num 7638 10520 9349 6693 4776 ...
$ Weight 2 : num 2504 3025 4616 5058 5169 ...
$ M : num 2223 3080 2800 2337 1699 ...
$ F : num 3483 4466 5616 4493 4019 ...
$ Local : num 2608 5064 2362 1998 1239 ...
$ Tourist : num 3132 3149 5911 4753 4412 ...
$ APAC : num 1854 1899 4062 3487 3387 ...
$ MIDDLE EAST : num 413 642 264 276 293 219 284 248 304 402 ...
$ EUROPE : num 2683 4794 2627 2005 1261 ...
$ AMERICAS : num 557 479 937 692 531 472 419 533 550 617 ...
$ REST OF EMEIA : num 232 391 380 291 175 176 203 212 304 414 ...
See if any correlations can be found in raw table form:
cor(dataset2)
Month DF_month Count of Sales Total Sales by Month Weight 1 Weight 2 M F Local Tourist
Month 1.0000000 0.58297525 0.75091270 0.75841834 0.6333090 0.20364200 0.63301875 0.6522432 0.49150609 0.50153100
DF_month 0.5829752 1.00000000 0.34394577 0.34069263 0.1032196 0.49698968 0.14556165 0.4456899 -0.26456514 0.65350642
Count of Sales 0.7509127 0.34394577 1.00000000 0.99567928 0.9189764 0.08324705 0.95434049 0.9243155 0.71387637 0.69701430
Total Sales by Month 0.7584183 0.34069263 0.99567928 1.00000000 0.9425971 0.01117762 0.94749622 0.9012294 0.71842392 0.67897392
Weight 1 0.6333090 0.10321964 0.91897643 0.94259710 1.0000000 -0.31261878 0.92335925 0.7312283 0.83099149 0.45340365
Weight 2 0.2036420 0.49698968 0.08324705 0.01117762 -0.3126188 1.00000000 -0.03752924 0.3773841 -0.35393441 0.52136126
M 0.6330187 0.14556165 0.95434049 0.94749622 0.9233593 -0.03752924 1.00000000 0.8517190 0.82772001 0.53526513
F 0.6522432 0.44568992 0.92431550 0.90122937 0.7312283 0.37738405 0.85171902 1.0000000 0.46161227 0.88223816
Local 0.4915061 -0.26456514 0.71387637 0.71842392 0.8309915 -0.35393441 0.82772001 0.4616123 1.00000000 0.01517381
Tourist 0.5015310 0.65350642 0.69701430 0.67897392 0.4534036 0.52136126 0.53526513 0.8822382 0.01517381 1.00000000
APAC 0.4018405 0.62105313 0.54133380 0.52846540 0.3177338 0.50484212 0.36454822 0.7541036 -0.15277866 0.95862421
MIDDLE EAST 0.3955311 -0.36438741 0.48630855 0.48435247 0.6216005 -0.37165305 0.61285866 0.1991872 0.92684667 -0.25439863
EUROPE 0.5265591 -0.19855337 0.77008891 0.77648755 0.8733861 -0.33473074 0.86811550 0.5282485 0.99338651 0.09534128
AMERICAS 0.5408857 0.68729672 0.66920353 0.63656813 0.3751451 0.62017541 0.54826047 0.8235065 0.11619953 0.82761627
REST OF EMEIA 0.2430476 -0.05972951 0.66704500 0.62673814 0.5727759 0.15893665 0.76901933 0.6447770 0.69073750 0.31461610
APAC MIDDLE EAST EUROPE AMERICAS REST OF EMEIA
Month 0.40184045 0.3955311 0.52655911 0.5408857 0.24304763
DF_month 0.62105313 -0.3643874 -0.19855337 0.6872967 -0.05972951
Count of Sales 0.54133380 0.4863086 0.77008891 0.6692035 0.66704500
Total Sales by Month 0.52846540 0.4843525 0.77648755 0.6365681 0.62673814
Weight 1 0.31773385 0.6216005 0.87338609 0.3751451 0.57277590
Weight 2 0.50484212 -0.3716530 -0.33473074 0.6201754 0.15893665
M 0.36454822 0.6128587 0.86811550 0.5482605 0.76901933
F 0.75410363 0.1991872 0.52824849 0.8235065 0.64477703
Local -0.15277866 0.9268467 0.99338651 0.1161995 0.69073750
Tourist 0.95862421 -0.2543986 0.09534128 0.8276163 0.31461610
APAC 1.00000000 -0.4017989 -0.08750726 0.6585824 0.08412221
MIDDLE EAST -0.40179890 1.0000000 0.89299865 -0.1070629 0.54681625
EUROPE -0.08750726 0.8929986 1.00000000 0.2022545 0.71685140
AMERICAS 0.65858240 -0.1070629 0.20225454 1.0000000 0.56597889
REST OF EMEIA 0.08412221 0.5468163 0.71685140 0.5659789 1.00000000
Look at corrlations in a chart form (easier to read)
corrplot(cor(dataset2))
Any immediate correlations?
Interesting ones are Weight_softs_DF (value 1 “Heavy”) and Male has a stronger correlation than Females.
plot(dataset2$Month, dataset2$`Total Sales by Month`)
plot(dataset2$Month, dataset2$`Count of Sales`)
plot_ly(data = dataset2, x = dataset2$Month, y = dataset2$`Count of Sales`, color = dataset2$DF_month)
No trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No scatter mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
No trace type specified:
Based on info supplied, a 'scatter' trace seems appropriate.
Read more about this trace type -> https://plot.ly/r/reference/#scatter
No scatter mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
Generate test data from stats dataset.
set.seed(123)
split <- sample.split(dataset2, SplitRatio=0.75)
train <- subset(dataset2, split==TRUE)
Length of logical index must be 1 or 12, not 15
test <- subset(dataset2, split==FALSE)
Length of logical index must be 1 or 12, not 15
Calculate univariate model using Weight 1 and M which were identified from first attempt of correlation.
univariateModel <- lm(formula = `Weight 1` ~M, data=train)
summary(univariateModel)
Call:
lm(formula = `Weight 1` ~ M, data = train)
Residuals:
Min 1Q Median 3Q Max
-1741.6 -805.2 177.0 553.7 2057.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1846.5275 1470.3847 -1.256 0.25587
M 3.8707 0.7062 5.481 0.00154 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1332 on 6 degrees of freedom
(3 observations deleted due to missingness)
Multiple R-squared: 0.8335, Adjusted R-squared: 0.8058
F-statistic: 30.04 on 1 and 6 DF, p-value: 0.001543
Based on the summary the p-value for M (male) is 0.00154 - which means this could be meaningful.. the p-value for ‘Weight 1’ (heavy) is 0.25587 - which means this could be non-meaningful for the model.
Plot line on training data.
plot(x=train$`Weight 1`, y=train$M)
abline(univariateModel)
Generate prediction based on the model and test data and apply to test dataset.
test$M.predicted <- predict(univariateModel, test)
Visualise the relationship and fit a line based on regression..
Reload librarys
library(ggplot2)
library(plotly)
Generate graph
pl1 <-test %>%
ggplot(aes(y = M,x = `Weight 1`)) +
geom_point(alpha=0.5) +
stat_smooth() +
ylab('Actual value of M') +
xlab('Month')+
theme_bw()+
geom_abline(intercept = -1846, slope = 3.871) # Regression line added as intercept and slope due to no line shown. Still no line shown.
ggplotly(pl1)
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
span too small. fewer data values than degrees of freedom.pseudoinverse used at 2319.2neighborhood radius 3953.8reciprocal condition number 0There are other near singularities as well. 3.7588e+06NaNs producedspan too small. fewer data values than degrees of freedom.pseudoinverse used at 2319.2neighborhood radius 3953.8reciprocal condition number 0There are other near singularities as well. 3.7588e+06NaNs produced
Error in the above to show a actual Regression line..
Attempt two without a element of smoothness:
pl2 <-dataset2 %>%
ggplot(aes(y = M,x = Month)) +
geom_point(alpha=0.5) +
ylab('Actual value of M') +
xlab('Month')+
theme_bw()+
geom_abline(intercept = -1846, slope = 3.871)
ggplotly(pl2)
Failed..
Plot residules to look for any other pattern:
plot(univariateModel$residuals)
Potential pattern found in the residuals.
In summary the answer is yes according to the data. There is a relationship between the Scarf sales can be determined by Excel spreadsheets and looking at the trend of sales over time. The flow of sales in trend with the calendar months which we can look in more depth when we look at how region affects sales.
Regression procedure is correct however there is not enough data taken to create a accurate calculation, after discussion with decoded we need more data to be more effective.
Discussion with Daniel Noonan and during ‘drop in session’ which advised to expand the dataset from months to calendar weeks.
Import new figures from excel sheet into new data frame.
dataset3 <- read_xlsx("DF - regression_cleansed.xlsx", sheet=3)
-
/
Plot a graph to show trend of sales over the calendar weeks.
plot(x=dataset3$Calendar_Week, y=dataset3$`Count of Sales`)
Plot a graph to show the total sales by month.
plot(x=dataset3$Calendar_Week, y=dataset3$`Total Sales by Month`)
Initial trend is that sales of scarfs increase at then end of the year so that statement would be true from the data we have.
Plot the correlation of the new data frame.
corrplot(cor(dataset3))
Some observations of note is that we can determine from the data is that most people buy a scarf locally.
Experiment
ggplot(dataset3, aes(factor(dataset3$Calendar_Week), dataset3$`Count of Sales` / dataset3$`Weight 1`)) +
geom_line(size = 1, alpha = 0.8) +
geom_point(size = 2) +
xlab("Calendar week") +
ylab("Proportion of sales of Soft Scarfs")
Experiment
ggplot(dataset3, aes(dataset3$Calendar_Week)) + geom_point(aes(y=dataset3$APAC), color="red") + geom_point(aes(y=dataset3$`MIDDLE EAST`), color = "green") + geom_point(aes(y=dataset3$EUROPE), color="blue") + geom_point(aes(y=dataset3$AMERICAS), color="yellow") + geom_point(aes(y=dataset3$`REST OF EMEIA`), color="purple") +
geom_smooth(method='lm') +
xlab("Week") +
ylab("Proportion of sales for each Region")
Error: stat_smooth requires the following missing aesthetics: y
AIM: We predict based on last years data when to increase focus on Scarf sales.
Generated another dataframe with the data in a different format. This is due to experimentation on ‘Question - Quantity of scarfs sold increase in winter months?’ section.
dataset4 <- read_xlsx("DF - regression_cleansed.xlsx", sheet = 4)
-
/
Plot of graph with labels for Scarf sales for each region showing the trends.
labels_Regions <- c(
"EUROPE" = expression(EUROPE),
"MIDDLE_EAST" = expression(MIDDLE_EAST),
"APAC" = expression(APAC),
"AMERICAS" = expression(AMERICAS),
"REST_OF_EMEIA" = expression(REST_OF_EMEIA)
)
ggplot(dataset4, aes(dataset4$Week, dataset4$`Sales Count`, shape=dataset4$Region, colour=dataset4$Region, fill=dataset4$Region)) +
geom_smooth(method="lm", show.legend = FALSE) +
geom_point(size=3) +
theme_bw() +
xlab("Weeks") +
ylab("Sales Count") +
ggtitle("Scarf Sales by region") +
expand_limits(y=0) +
coord_fixed(ratio = 1/30, expand = TRUE, clip = "off") +
scale_shape(labels = labels_Regions) +
scale_colour_discrete(labels = labels_Regions) +
guides(fill = "none")
From this graph we can see that sales are higher in the Winter months particularly for APAC and EUROPE. A general trend of increased sales is observed in this time period too.
Another attempt performed after futher discussion with colleagues.
In this notebook we will be using data set aquired by Audrey on Scarf Sales. The dataset was a CSV and converted to XLSX. Each row contains information about the scarf sales. The columns relate to entries such as Scarf type, region bought, value of sale, etc.
We will use the Scarf Sales data to look at trends in Winter and compare with Summer. To gather potential variables for use in predicting trends and
Import the data for use into a dataframe.
dataset <- read_xlsx("DF - regression_cleansed.xlsx", sheet = 1)
-
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
Expecting numeric in AN2077 / R2077C40: got 'LARGE'
-
\
|
/
-
\
|
/
Expecting numeric in AN29273 / R29273C40: got 'LARGE'
-
\
|
/
-
\
Expecting numeric in AK50927 / R50927C37: got 'MID'Expecting numeric in AN51348 / R51348C40: got 'SMALL'
|
Expecting numeric in AN52714 / R52714C40: got 'SMALL'Expecting numeric in AK53564 / R53564C37: got 'MID'Expecting numeric in AK53821 / R53821C37: got 'MID'Expecting numeric in AN54041 / R54041C40: got 'SMALL'Expecting numeric in AK54063 / R54063C37: got 'MID'Expecting numeric in AK54301 / R54301C37: got 'MID'Expecting numeric in AK54429 / R54429C37: got 'MID'Expecting numeric in AK54430 / R54430C37: got 'MID'Expecting numeric in AN55260 / R55260C40: got 'SMALL'
/
Expecting numeric in AN57012 / R57012C40: got 'LARGE'
-
\
|
/
Expecting numeric in AN71255 / R71255C40: got 'SMALL'Expecting numeric in AN71627 / R71627C40: got 'LARGE'Expecting numeric in AN72368 / R72368C40: got 'LARGE'
-
\
Expecting numeric in AK77883 / R77883C37: got 'MID'Expecting numeric in AK79008 / R79008C37: got 'MID'Expecting numeric in AN79392 / R79392C40: got 'LARGE'Expecting numeric in AN79393 / R79393C40: got 'LARGE'Expecting numeric in AK79959 / R79959C37: got 'MID'Expecting numeric in AK79966 / R79966C37: got 'MID'
|
Expecting numeric in AK80289 / R80289C37: got 'MID'Expecting numeric in AN81083 / R81083C40: got 'LARGE'Expecting numeric in AK81211 / R81211C37: got 'MID'Expecting numeric in AN81418 / R81418C40: got 'LARGE'Expecting numeric in AN81437 / R81437C40: got 'LARGE'Expecting numeric in AK81867 / R81867C37: got 'MID'Expecting numeric in AN82033 / R82033C40: got 'SMALL'Expecting numeric in AK82314 / R82314C37: got 'MID'Expecting numeric in AK82434 / R82434C37: got 'MID'Expecting numeric in AK83019 / R83019C37: got 'MID'Expecting numeric in AN83266 / R83266C40: got 'LARGE'
/
Expecting numeric in AK83746 / R83746C37: got 'MID'Expecting numeric in AN84268 / R84268C40: got 'LARGE'Expecting numeric in AK84587 / R84587C37: got 'MID'Expecting numeric in AK84857 / R84857C37: got 'MID'Expecting numeric in AK84942 / R84942C37: got 'MID'Expecting numeric in AN85233 / R85233C40: got 'SMALL'Expecting numeric in AN85332 / R85332C40: got 'SMALL'Expecting numeric in AN85334 / R85334C40: got 'SMALL'Expecting numeric in AN85470 / R85470C40: got 'LARGE'Expecting numeric in AN85486 / R85486C40: got 'SMALL'Expecting numeric in AK85701 / R85701C37: got 'MID'Expecting numeric in AK85703 / R85703C37: got 'MID'Expecting numeric in AN85824 / R85824C40: got 'SMALL'Expecting numeric in AN86078 / R86078C40: got 'LARGE'Expecting numeric in AK86219 / R86219C37: got 'MID'Expecting numeric in AK86404 / R86404C37: got 'MID'Expecting numeric in AN86423 / R86423C40: got 'SMALL'Expecting numeric in AN86458 / R86458C40: got 'LARGE'Expecting numeric in AK86548 / R86548C37: got 'MID'Expecting numeric in AK86553 / R86553C37: got 'MID'Expecting numeric in AN86567 / R86567C40: got 'SMALL'Expecting numeric in AN86610 / R86610C40: got 'LARGE'Expecting numeric in AN86997 / R86997C40: got 'LARGE'Expecting numeric in AK87057 / R87057C37: got 'MID'Expecting numeric in AK87058 / R87058C37: got 'MID'
-
Expecting numeric in AN87117 / R87117C40: got 'LARGE'Expecting numeric in AK87118 / R87118C37: got 'MID'Expecting numeric in AK87144 / R87144C37: got 'MID'Expecting numeric in AN87147 / R87147C40: got 'LARGE'Expecting numeric in AN87280 / R87280C40: got 'LARGE'Expecting numeric in AK87478 / R87478C37: got 'MID'Expecting numeric in AK87552 / R87552C37: got 'MID'Expecting numeric in AN87592 / R87592C40: got 'LARGE'Expecting numeric in AK87719 / R87719C37: got 'MID'Expecting numeric in AK87727 / R87727C37: got 'MID'Expecting numeric in AK87837 / R87837C37: got 'MID'Expecting numeric in AN87852 / R87852C40: got 'SMALL'Expecting numeric in AN88076 / R88076C40: got 'LARGE'Expecting numeric in AN88289 / R88289C40: got 'LARGE'Expecting numeric in AK88355 / R88355C37: got 'MID'Expecting numeric in AK88420 / R88420C37: got 'MID'Expecting numeric in AK88424 / R88424C37: got 'MID'Expecting numeric in AN88650 / R88650C40: got 'SMALL'Expecting numeric in AK88663 / R88663C37: got 'MID'Expecting numeric in AN88856 / R88856C40: got 'LARGE'Expecting numeric in AN89042 / R89042C40: got 'LARGE'Expecting numeric in AN89115 / R89115C40: got 'SMALL'Expecting numeric in AK89273 / R89273C37: got 'MID'Expecting numeric in AK89291 / R89291C37: got 'MID'Expecting numeric in AK89336 / R89336C37: got 'MID'Expecting numeric in AK89461 / R89461C37: got 'MID'Expecting numeric in AK89484 / R89484C37: got 'MID'Expecting numeric in AK89558 / R89558C37: got 'MID'Expecting numeric in AN89715 / R89715C40: got 'LARGE'Expecting numeric in AN89778 / R89778C40: got 'SMALL'Expecting numeric in AK89804 / R89804C37: got 'MID'Expecting numeric in AK89890 / R89890C37: got 'MID'Expecting numeric in AN90019 / R90019C40: got 'SMALL'Expecting numeric in AN90220 / R90220C40: got 'LARGE'Expecting numeric in AN90221 / R90221C40: got 'LARGE'Expecting numeric in AN90378 / R90378C40: got 'LARGE'Expecting numeric in AN90441 / R90441C40: got 'LARGE'Expecting numeric in AK90463 / R90463C37: got 'MID'Expecting numeric in AK90495 / R90495C37: got 'MID'Expecting numeric in AK90524 / R90524C37: got 'MID'
\
Expecting numeric in AK90597 / R90597C37: got 'MID'Expecting numeric in AK90734 / R90734C37: got 'MID'Expecting numeric in AN90966 / R90966C40: got 'SMALL'Expecting numeric in AK91433 / R91433C37: got 'MID'Expecting numeric in AN91445 / R91445C40: got 'LARGE'Expecting numeric in AK91450 / R91450C37: got 'MID'Expecting numeric in AN91618 / R91618C40: got 'LARGE'Expecting numeric in AK91901 / R91901C37: got 'MID'Expecting numeric in AN92062 / R92062C40: got 'SMALL'Expecting numeric in AK92195 / R92195C37: got 'MID'Expecting numeric in AK92263 / R92263C37: got 'MID'Expecting numeric in AK92268 / R92268C37: got 'MID'Expecting numeric in AK92429 / R92429C37: got 'MID'Expecting numeric in AK92431 / R92431C37: got 'MID'Expecting numeric in AK92603 / R92603C37: got 'MID'Expecting numeric in AN92678 / R92678C40: got 'SMALL'Expecting numeric in AK92783 / R92783C37: got 'MID'Expecting numeric in AN92934 / R92934C40: got 'LARGE'Expecting numeric in AN92943 / R92943C40: got 'LARGE'Expecting numeric in AN92993 / R92993C40: got 'LARGE'Expecting numeric in AN93011 / R93011C40: got 'LARGE'Expecting numeric in AK93293 / R93293C37: got 'MID'Expecting numeric in AK93625 / R93625C37: got 'MID'Expecting numeric in AN93748 / R93748C40: got 'LARGE'Expecting numeric in AN93754 / R93754C40: got 'SMALL'Expecting numeric in AK93816 / R93816C37: got 'MID'Expecting numeric in AN93830 / R93830C40: got 'LARGE'Expecting numeric in AK93880 / R93880C37: got 'MID'Expecting numeric in AK94054 / R94054C37: got 'MID'
|
Expecting numeric in AK94071 / R94071C37: got 'MID'Expecting numeric in AK94077 / R94077C37: got 'MID'Expecting numeric in AK94079 / R94079C37: got 'MID'Expecting numeric in AN94113 / R94113C40: got 'SMALL'Expecting numeric in AN94118 / R94118C40: got 'LARGE'Expecting numeric in AN94251 / R94251C40: got 'LARGE'Expecting numeric in AK94432 / R94432C37: got 'MID'Expecting numeric in AN94620 / R94620C40: got 'LARGE'Expecting numeric in AN94637 / R94637C40: got 'SMALL'Expecting numeric in AN94643 / R94643C40: got 'SMALL'Expecting numeric in AK94766 / R94766C37: got 'MID'Expecting numeric in AK94935 / R94935C37: got 'MID'Expecting numeric in AK94963 / R94963C37: got 'MID'Expecting numeric in AK95020 / R95020C37: got 'MID'Expecting numeric in AK95126 / R95126C37: got 'MID'Expecting numeric in AN95215 / R95215C40: got 'LARGE'Expecting numeric in AN95619 / R95619C40: got 'SMALL'Expecting numeric in AK95746 / R95746C37: got 'MID'Expecting numeric in AN95756 / R95756C40: got 'LARGE'Expecting numeric in AN95927 / R95927C40: got 'LARGE'Expecting numeric in AN95928 / R95928C40: got 'LARGE'Expecting numeric in AN95979 / R95979C40: got 'LARGE'Expecting numeric in AK96049 / R96049C37: got 'MID'Expecting numeric in AN96064 / R96064C40: got 'LARGE'Expecting numeric in AN96148 / R96148C40: got 'LARGE'Expecting numeric in AN96153 / R96153C40: got 'LARGE'Expecting numeric in AN96167 / R96167C40: got 'LARGE'Expecting numeric in AN96274 / R96274C40: got 'LARGE'Expecting numeric in AN96301 / R96301C40: got 'SMALL'Expecting numeric in AN96495 / R96495C40: got 'LARGE'Expecting numeric in AK96644 / R96644C37: got 'MID'Expecting numeric in AK96685 / R96685C37: got 'MID'Expecting numeric in AN96739 / R96739C40: got 'SMALL'Expecting numeric in AN96741 / R96741C40: got 'LARGE'Expecting numeric in AK97083 / R97083C37: got 'MID'Expecting numeric in AN97088 / R97088C40: got 'LARGE'Expecting numeric in AN97161 / R97161C40: got 'LARGE'Expecting numeric in AN97206 / R97206C40: got 'SMALL'Expecting numeric in AN97507 / R97507C40: got 'SMALL'
/
Expecting numeric in AN97681 / R97681C40: got 'LARGE'Expecting numeric in AN97733 / R97733C40: got 'LARGE'Expecting numeric in AN97819 / R97819C40: got 'LARGE'Expecting numeric in AK97915 / R97915C37: got 'MID'Expecting numeric in AN97918 / R97918C40: got 'SMALL'Expecting numeric in AN97937 / R97937C40: got 'LARGE'Expecting numeric in AK98083 / R98083C37: got 'MID'Expecting numeric in AK98084 / R98084C37: got 'MID'Expecting numeric in AK98087 / R98087C37: got 'MID'Expecting numeric in AN98104 / R98104C40: got 'SMALL'Expecting numeric in AN98154 / R98154C40: got 'SMALL'Expecting numeric in AK98332 / R98332C37: got 'MID'Expecting numeric in AN98514 / R98514C40: got 'LARGE'Expecting numeric in AK98903 / R98903C37: got 'MID'Expecting numeric in AN99320 / R99320C40: got 'LARGE'Expecting numeric in AN99674 / R99674C40: got 'LARGE'Expecting numeric in AK99707 / R99707C37: got 'MID'Expecting numeric in AN99727 / R99727C40: got 'SMALL'Expecting numeric in AK99874 / R99874C37: got 'MID'Expecting numeric in AN100125 / R100125C40: got 'LARGE'Expecting numeric in AK100129 / R100129C37: got 'MID'Expecting numeric in AK100182 / R100182C37: got 'MID'Expecting numeric in AK100243 / R100243C37: got 'MID'Expecting numeric in AN100249 / R100249C40: got 'LARGE'Expecting numeric in AK100619 / R100619C37: got 'MID'Expecting numeric in AN100898 / R100898C40: got 'LARGE'
-
Expecting numeric in AK101029 / R101029C37: got 'MID'Expecting numeric in AN101083 / R101083C40: got 'LARGE'Expecting numeric in AN101096 / R101096C40: got 'SMALL'Expecting numeric in AN101157 / R101157C40: got 'SMALL'Expecting numeric in AN101182 / R101182C40: got 'SMALL'Expecting numeric in AK101315 / R101315C37: got 'MID'Expecting numeric in AK101522 / R101522C37: got 'MID'Expecting numeric in AN101839 / R101839C40: got 'LARGE'Expecting numeric in AK101950 / R101950C37: got 'MID'Expecting numeric in AN102190 / R102190C40: got 'LARGE'Expecting numeric in AN102199 / R102199C40: got 'SMALL'Expecting numeric in AN102259 / R102259C40: got 'LARGE'Expecting numeric in AN102327 / R102327C40: got 'SMALL'Expecting numeric in AK102543 / R102543C37: got 'MID'Expecting numeric in AN102621 / R102621C40: got 'SMALL'Expecting numeric in AK102632 / R102632C37: got 'MID'Expecting numeric in AK102931 / R102931C37: got 'MID'Expecting numeric in AK103004 / R103004C37: got 'MID'Expecting numeric in AN103024 / R103024C40: got 'SMALL'Expecting numeric in AK103164 / R103164C37: got 'MID'Expecting numeric in AK103174 / R103174C37: got 'MID'Expecting numeric in AK103348 / R103348C37: got 'MID'Expecting numeric in AN103640 / R103640C40: got 'LARGE'Expecting numeric in AN103712 / R103712C40: got 'LARGE'Expecting numeric in AN103804 / R103804C40: got 'LARGE'Expecting numeric in AN103838 / R103838C40: got 'LARGE'Expecting numeric in AK104086 / R104086C37: got 'MID'Expecting numeric in AK104217 / R104217C37: got 'MID'Expecting numeric in AN104428 / R104428C40: got 'LARGE'
\
Expecting numeric in AN104496 / R104496C40: got 'LARGE'Expecting numeric in AN104553 / R104553C40: got 'LARGE'Expecting numeric in AK104655 / R104655C37: got 'MID'Expecting numeric in AN104958 / R104958C40: got 'LARGE'Expecting numeric in AK104967 / R104967C37: got 'MID'Expecting numeric in AK105027 / R105027C37: got 'MID'Expecting numeric in AK105284 / R105284C37: got 'MID'Expecting numeric in AK105288 / R105288C37: got 'MID'Expecting numeric in AK105542 / R105542C37: got 'MID'Expecting numeric in AK105543 / R105543C37: got 'MID'Expecting numeric in AK105586 / R105586C37: got 'MID'Expecting numeric in AK105759 / R105759C37: got 'MID'Expecting numeric in AN105788 / R105788C40: got 'SMALL'Expecting numeric in AK105789 / R105789C37: got 'MID'Expecting numeric in AN105847 / R105847C40: got 'LARGE'Expecting numeric in AN105909 / R105909C40: got 'LARGE'Expecting numeric in AK105978 / R105978C37: got 'MID'Expecting numeric in AN106222 / R106222C40: got 'LARGE'Expecting numeric in AK106248 / R106248C37: got 'MID'Expecting numeric in AK106317 / R106317C37: got 'MID'Expecting numeric in AN106761 / R106761C40: got 'LARGE'Expecting numeric in AN106762 / R106762C40: got 'LARGE'Expecting numeric in AK106932 / R106932C37: got 'MID'Expecting numeric in AN107163 / R107163C40: got 'LARGE'Expecting numeric in AN107247 / R107247C40: got 'LARGE'Expecting numeric in AN107454 / R107454C40: got 'SMALL'Expecting numeric in AN107458 / R107458C40: got 'LARGE'Expecting numeric in AN107623 / R107623C40: got 'LARGE'Expecting numeric in AN107783 / R107783C40: got 'SMALL'Expecting numeric in AN107827 / R107827C40: got 'LARGE'Expecting numeric in AN107913 / R107913C40: got 'LARGE'
|
Expecting numeric in AN108061 / R108061C40: got 'LARGE'Expecting numeric in AK108130 / R108130C37: got 'MID'Expecting numeric in AN108633 / R108633C40: got 'LARGE'Expecting numeric in AN108781 / R108781C40: got 'SMALL'Expecting numeric in AN108785 / R108785C40: got 'LARGE'Expecting numeric in AN108874 / R108874C40: got 'LARGE'Expecting numeric in AK108958 / R108958C37: got 'MID'Expecting numeric in AN109058 / R109058C40: got 'SMALL'Expecting numeric in AN109259 / R109259C40: got 'LARGE'Expecting numeric in AN109461 / R109461C40: got 'LARGE'Expecting numeric in AK109587 / R109587C37: got 'MID'Expecting numeric in AK109679 / R109679C37: got 'MID'Expecting numeric in AK109681 / R109681C37: got 'MID'Expecting numeric in AK109965 / R109965C37: got 'MID'Expecting numeric in AN110040 / R110040C40: got 'LARGE'Expecting numeric in AN110082 / R110082C40: got 'SMALL'Expecting numeric in AK110100 / R110100C37: got 'MID'Expecting numeric in AK110303 / R110303C37: got 'MID'Expecting numeric in AN110802 / R110802C40: got 'LARGE'Expecting numeric in AK110858 / R110858C37: got 'MID'Expecting numeric in AN110952 / R110952C40: got 'LARGE'
/
Expecting numeric in AK111603 / R111603C37: got 'MID'Expecting numeric in AN111622 / R111622C40: got 'LARGE'Expecting numeric in AN111807 / R111807C40: got 'SMALL'Expecting numeric in AK111892 / R111892C37: got 'MID'Expecting numeric in AN111893 / R111893C40: got 'SMALL'Expecting numeric in AN112362 / R112362C40: got 'LARGE'Expecting numeric in AN112648 / R112648C40: got 'SMALL'Expecting numeric in AN112897 / R112897C40: got 'LARGE'Expecting numeric in AN112954 / R112954C40: got 'LARGE'Expecting numeric in AK112966 / R112966C37: got 'MID'Expecting numeric in AK112983 / R112983C37: got 'MID'Expecting numeric in AK113400 / R113400C37: got 'MID'Expecting numeric in AK113446 / R113446C37: got 'MID'Expecting numeric in AN114030 / R114030C40: got 'LARGE'Expecting numeric in AN114180 / R114180C40: got 'LARGE'Expecting numeric in AN114195 / R114195C40: got 'SMALL'Expecting numeric in AN114390 / R114390C40: got 'LARGE'Expecting numeric in AN114561 / R114561C40: got 'LARGE'Expecting numeric in AK114692 / R114692C37: got 'MID'Expecting numeric in AN114919 / R114919C40: got 'LARGE'Expecting numeric in AK114950 / R114950C37: got 'MID'Expecting numeric in AN114968 / R114968C40: got 'LARGE'Expecting numeric in AN114969 / R114969C40: got 'LARGE'
-
Expecting numeric in AN114999 / R114999C40: got 'LARGE'Expecting numeric in AN115336 / R115336C40: got 'SMALL'Expecting numeric in AK115582 / R115582C37: got 'MID'Expecting numeric in AK115666 / R115666C37: got 'MID'Expecting numeric in AK116011 / R116011C37: got 'MID'Expecting numeric in AN116084 / R116084C40: got 'SMALL'Expecting numeric in AK116690 / R116690C37: got 'MID'Expecting numeric in AK116726 / R116726C37: got 'MID'Expecting numeric in AN116790 / R116790C40: got 'SMALL'Expecting numeric in AK116835 / R116835C37: got 'MID'
\
View the data frame, statistics and column information.
summary(dataset)
record_type trans_id cal_week cal_month cal_month_DF cal_year trans_channel store_no store_country_code
Length:116890 Min. :2.020e+16 Min. : 1.00 Min. : 1.000 Min. :1.0 Min. :2017 Length:116890 Min. :1000 Length:116890
Class :character 1st Qu.:2.020e+19 1st Qu.:15.00 1st Qu.: 4.000 1st Qu.:1.0 1st Qu.:2018 Class :character 1st Qu.:1032 Class :character
Mode :character Median :2.020e+19 Median :32.00 Median : 8.000 Median :3.0 Median :2018 Mode :character Median :1128 Mode :character
Mean :6.266e+21 Mean :29.37 Mean : 7.176 Mean :2.6 Mean :2018 Mean :1461
3rd Qu.:2.020e+20 3rd Qu.:43.00 3rd Qu.:10.000 3rd Qu.:4.0 3rd Qu.:2018 3rd Qu.:1255
Max. :2.020e+23 Max. :52.00 Max. :12.000 Max. :4.0 Max. :2018 Max. :7701
store_country store_region store_subregion store_climate parent_customer_no parent_customer_id gender cust_country_code
Length:116890 Length:116890 Length:116890 Length:116890 Min. :1.000e+06 Length:116890 Length:116890 Length:116890
Class :character Class :character Class :character Class :character 1st Qu.:1.000e+11 Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Median :3.120e+11 Mode :character Mode :character Mode :character
Mean :4.138e+11
3rd Qu.:8.150e+11
Max. :1.000e+12
NA's :45860
cust_country cust_market cust_region local_tourist_flag actualsalesamt_central quantity actualsalesamt_central_DF
Length:116890 Length:116890 Length:116890 Length:116890 Min. :-1400.0 Min. :-5.0000 Min. : 0.769
Class :character Class :character Class :character Class :character 1st Qu.: 276.9 1st Qu.: 1.0000 1st Qu.:280.000
Mode :character Mode :character Mode :character Mode :character Median : 311.0 Median : 1.0000 Median :311.023
Mean : 298.6 Mean : 0.9428 Mean :317.399
3rd Qu.: 350.0 3rd Qu.: 1.0000 3rd Qu.:350.000
Max. : 2475.0 Max. :10.0000 Max. :995.000
discount_amount_central_constant prod_desc prod_sku prod_desc_detailed level3_desc_prev level4_desc_prev level2_desc
Length:116890 Length:116890 Min. :32013061 Length:116890 Length:116890 Length:116890 Length:116890
Class :character Class :character 1st Qu.:39295221 Class :character Class :character Class :character Class :character
Mode :character Mode :character Median :40013631 Mode :character Mode :character Mode :character Mode :character
Mean :40411564
3rd Qu.:40654211
Max. :80100471
level25_desc level3_desc level4_desc level5_desc weight_softs weight_softs_DF shape_accessories
Length:116890 Length:116890 Length:116890 Length:116890 Length:116890 Min. :1.000 Length:116890
Class :character Class :character Class :character Class :character Class :character 1st Qu.:1.000 Class :character
Mode :character Mode :character Mode :character Mode :character Mode :character Median :1.000 Mode :character
Mean :1.414
3rd Qu.:2.000
Max. :2.000
NA's :3605
sizegroupings_accessories sizegroupings_accessories_DF subworld_accessories
Length:116890 Min. : 1.00 Length:116890
Class :character 1st Qu.:11.00 Class :character
Mode :character Median :19.00 Mode :character
Mean :20.36
3rd Qu.:32.00
Max. :40.00
NA's :5334
str(dataset)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 116890 obs. of 41 variables:
$ record_type : chr "ITEM" "ITEM" "ITEM" "ITEM" ...
$ trans_id : num 2.02e+19 2.02e+19 2.02e+19 2.02e+19 2.02e+19 ...
$ cal_week : num 45 45 45 45 45 45 45 45 45 45 ...
$ cal_month : num 11 11 11 11 11 11 11 11 11 11 ...
$ cal_month_DF : num 4 4 4 4 4 4 4 4 4 4 ...
$ cal_year : num 2018 2018 2018 2018 2018 ...
$ trans_channel : chr "MAINLINE Direct" "MAINLINE Direct" "MAINLINE Direct" "MAINLINE Direct" ...
$ store_no : num 7159 7159 7159 7159 7152 ...
$ store_country_code : chr "SA" "SA" "SA" "SA" ...
$ store_country : chr "Saudi Arabia" "Saudi Arabia" "Saudi Arabia" "Saudi Arabia" ...
$ store_region : chr "EMEIA" "EMEIA" "EMEIA" "EMEIA" ...
$ store_subregion : chr "Middle East" "Middle East" "Middle East" "Middle East" ...
$ store_climate : chr "Tropical" "Tropical" "Tropical" "Tropical" ...
$ parent_customer_no : num 1.00e+11 1.00e+11 1.00e+11 8.15e+11 8.15e+11 ...
$ parent_customer_id : chr "NULL" "NULL" "NULL" "NULL" ...
$ gender : chr "M" "F" "F" "M" ...
$ cust_country_code : chr "CN" "SA" "SA" "CS" ...
$ cust_country : chr "China" "Saudi Arabia" "Saudi Arabia" "Serbia/Monten." ...
$ cust_market : chr "APAC" "MIDDLE EAST" "MIDDLE EAST" "EUROPE" ...
$ cust_region : chr "Asia" "EMEIA" "EMEIA" NA ...
$ local_tourist_flag : chr "Tourist" "Local" "Local" "Tourist" ...
$ actualsalesamt_central : num -470 -316 316 316 351 ...
$ quantity : num -1 -1 1 1 1 1 1 1 1 1 ...
$ actualsalesamt_central_DF : num 470 316 316 316 351 ...
$ discount_amount_central_constant: chr "NULL" "NULL" "NULL" "NULL" ...
$ prod_desc : chr "HALF MEGA CK" "GAUZE GIANT CHK" "GAUZE GIANT CHK" "GAUZE GIANT CHK" ...
$ prod_sku : num 40310791 37432321 39628631 40013631 40583521 ...
$ prod_desc_detailed : chr "A:MU HALF MEGA CK:CS:2310B, 1" "A:MU GAUZE GIANT CHK:AALKT:2310C, 1" "A:MU GAUZE GIANT CHK:AALKT:2501B, 1" "A:MU GAUZE GIANT CHK:AALKT:5317B, 1" ...
$ level3_desc_prev : chr "SCARVES" "SCARVES" "SCARVES" "SCARVES" ...
$ level4_desc_prev : chr "SCARVES" "SCARVES" "SCARVES" "SCARVES" ...
$ level2_desc : chr "ACCESSORIES" "ACCESSORIES" "ACCESSORIES" "ACCESSORIES" ...
$ level25_desc : chr "SOFT ACCESSORIES" "SOFT ACCESSORIES" "SOFT ACCESSORIES" "SOFT ACCESSORIES" ...
$ level3_desc : chr "SCARVES" "SCARVES" "SCARVES" "SCARVES" ...
$ level4_desc : chr "SCARVES" "SCARVES" "SCARVES" "SCARVES" ...
$ level5_desc : chr "CASHMERE" "BLENDS" "BLENDS" "BLENDS" ...
$ weight_softs : chr "HEAVY" "LIGHT" "LIGHT" "LIGHT" ...
$ weight_softs_DF : num 1 2 2 2 1 2 2 2 2 2 ...
$ shape_accessories : chr "CHECK CASHMERE" "GAUZE" "GAUZE" "GAUZE" ...
$ sizegroupings_accessories : chr "200 X 36 CM" "220 X 70 CM" "220 X 70 CM" "220 X 70 CM" ...
$ sizegroupings_accessories_DF : num 19 32 32 32 11 24 32 32 32 37 ...
$ subworld_accessories : chr "HALF MEGA CHECK CASHMERE" "GAUZE" "GAUZE" "GAUZE" ...
Remove all null data (light easy clean).
datasetNN <- na.omit(dataset)
Generate test and train datasets to be used to test predictions.
set.seed(123)
sampleNN <- sample.split(datasetNN2, SplitRatio = 2/3)
trainNN <- subset(datasetNN2, sampleNN==TRUE)
Length of logical index must be 1 or 52, not 15
testNN <- subset(datasetNN2, sampleNN==FALSE)
Length of logical index must be 1 or 52, not 15
Perform a light test to see if there is any correlation between the seasons (Month_DF) and the quanty of sales:
cor.test( ~ cal_month_DF + quantity,
data=datasetNN,
method = "pearson",
conf.level = 0.95)
Pearson's product-moment correlation
data: cal_month_DF and quantity
t = 5.2897, df = 42965, p-value = 1.231e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.01605953 0.03495815
sample estimates:
cor
0.02551112
There doesnt appear to be a Correlation.
Created a summary sheet in Excel to generate statistics on the data captured. I then produce a correlation graph to show any correaltion without fixating on a month or any factor.
library(corrplot)
library(readxl)
datasetNN2 <- read_xlsx("DF - regression_cleansed.xlsx", sheet = 3)
-
/
cor.test( ~ datasetNN2$Month + datasetNN2$`Count of Sales`,
data=datasetNN2,
method = "pearson",
conf.level = 0.95)
Pearson's product-moment correlation
data: datasetNN2$Month and datasetNN2$`Count of Sales`
t = 1.4896, df = 50, p-value = 0.1426
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.07074981 0.45351890
sample estimates:
cor
0.2061304
corrplot(cor(datasetNN2))
I create a Univariate model based on Count of sales against Month.
univariateModel <- lm(formula = `Count of Sales` ~ Month, data=trainNN)
summary(univariateModel)
Call:
lm(formula = `Count of Sales` ~ Month, data = trainNN)
Residuals:
Min 1Q Median 3Q Max
-852.9 -391.9 -172.4 186.7 1988.9
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1825.2 282.0 6.472 2.78e-07 ***
Month 169.9 104.0 1.634 0.112
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 677.7 on 32 degrees of freedom
Multiple R-squared: 0.07697, Adjusted R-squared: 0.04812
F-statistic: 2.668 on 1 and 32 DF, p-value: 0.1122
Visualisation of the data as a whole. Grouped by the month (Month DF) with a regression line applied (via Univariate model).
plot(y=trainNN$`Count of Sales`, x=trainNN$Month)
abline(univariateModel)
Lets go back to the main objective of finding the variables in summer and winter and see if we can create a good predictor of sales.
Winter data (Month_DF 1) extracted and stored and displayed in a statistical form.
WinterDS <- datasetNN2 %>%
filter(Month == 1)
ggplot(WinterDS, aes(x = Calendar_Week, y = `Count of Sales`))+
geom_point()
Using correlation module to look for any correaltions in the Winter data.
corrplot(cor(WinterDS))
the standard deviation is zero
When looking at the data we see a high correlation with the calendar week and europe, middle east and Americas. No or litte correaltion in APAC and Tourist values.
Correlation figures for the columns:
cor(WinterDS)
the standard deviation is zero
Calendar_Week Month Count of Sales Total Sales by Month Weight 1 Weight 2 M F Local Tourist APAC
Calendar_Week 1.0000000 NA 0.6456269 0.6673667 0.6926321 0.4909783 0.6038672 0.4904480 0.7039725 0.1544524 -0.1082391
Month NA 1 NA NA NA NA NA NA NA NA NA
Count of Sales 0.6456269 NA 1.0000000 0.9993017 0.9877801 0.9676834 0.9534134 0.9490054 0.8373170 0.7589829 0.5744353
Total Sales by Month 0.6673667 NA 0.9993017 1.0000000 0.9875475 0.9628360 0.9493097 0.9423018 0.8348103 0.7500149 0.5577827
Weight 1 0.6926321 NA 0.9877801 0.9875475 1.0000000 0.9300030 0.9662194 0.9281758 0.9012113 0.6842500 0.4945985
Weight 2 0.4909783 NA 0.9676834 0.9628360 0.9300030 1.0000000 0.9143348 0.9654887 0.7160064 0.8834821 0.7373838
M 0.6038672 NA 0.9534134 0.9493097 0.9662194 0.9143348 1.0000000 0.9365216 0.8818293 0.7000303 0.5604048
F 0.4904480 NA 0.9490054 0.9423018 0.9281758 0.9654887 0.9365216 1.0000000 0.7483216 0.8609939 0.7384133
Local 0.7039725 NA 0.8373170 0.8348103 0.9012113 0.7160064 0.8818293 0.7483216 1.0000000 0.3487112 0.1764516
Tourist 0.1544524 NA 0.7589829 0.7500149 0.6842500 0.8834821 0.7000303 0.8609939 0.3487112 1.0000000 0.9506278
APAC -0.1082391 NA 0.5744353 0.5577827 0.4945985 0.7373838 0.5604048 0.7384133 0.1764516 0.9506278 1.0000000
MIDDLE EAST 0.6432394 NA 0.7270900 0.7294450 0.7813187 0.6308230 0.6715995 0.6254598 0.8360889 0.3236790 0.1229004
EUROPE 0.7163675 NA 0.8611850 0.8596299 0.9177137 0.7386886 0.8938425 0.7691008 0.9947278 0.3722778 0.1913851
AMERICAS 0.6290157 NA 0.7111019 0.7275363 0.6765409 0.7236570 0.5839441 0.6627810 0.3967417 0.6749384 0.4467771
REST OF EMEIA 0.4385359 NA 0.5720883 0.5687738 0.5877274 0.5712813 0.6200645 0.5217229 0.5941169 0.3865083 0.2700527
MIDDLE EAST EUROPE AMERICAS REST OF EMEIA
Calendar_Week 0.6432394 0.7163675 0.6290157 0.4385359
Month NA NA NA NA
Count of Sales 0.7270900 0.8611850 0.7111019 0.5720883
Total Sales by Month 0.7294450 0.8596299 0.7275363 0.5687738
Weight 1 0.7813187 0.9177137 0.6765409 0.5877274
Weight 2 0.6308230 0.7386886 0.7236570 0.5712813
M 0.6715995 0.8938425 0.5839441 0.6200645
F 0.6254598 0.7691008 0.6627810 0.5217229
Local 0.8360889 0.9947278 0.3967417 0.5941169
Tourist 0.3236790 0.3722778 0.6749384 0.3865083
APAC 0.1229004 0.1913851 0.4467771 0.2700527
MIDDLE EAST 1.0000000 0.8224254 0.4786328 0.3679651
EUROPE 0.8224254 1.0000000 0.4134063 0.5550544
AMERICAS 0.4786328 0.4134063 1.0000000 0.4856945
REST OF EMEIA 0.3679651 0.5550544 0.4856945 1.0000000
Create a univariate model with the following predictor variables: Target Variable: Count of Sales Predictor Variable: Middle East, Europe, Americas, Local, weight 1
lm([target variable] ~ [predictor variables], data = [data source])
univariateModelWinter <- lm(formula = `Count of Sales` ~ `MIDDLE EAST`+EUROPE+AMERICAS+Local+`Weight 1`, data=trainNN)
summary(univariateModelWinter)
Call:
lm(formula = `Count of Sales` ~ `MIDDLE EAST` + EUROPE + AMERICAS +
Local + `Weight 1`, data = trainNN)
Residuals:
Min 1Q Median 3Q Max
-440.01 -103.27 5.15 118.71 314.58
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 403.4538 160.9718 2.506 0.0183 *
`MIDDLE EAST` 2.4586 2.6042 0.944 0.3532
EUROPE -0.5253 1.2137 -0.433 0.6685
AMERICAS 5.2698 1.0720 4.916 3.49e-05 ***
Local 0.1705 1.1181 0.153 0.8799
`Weight 1` 0.9248 0.1259 7.343 5.38e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 198.7 on 28 degrees of freedom
Multiple R-squared: 0.9306, Adjusted R-squared: 0.9182
F-statistic: 75.06 on 5 and 28 DF, p-value: 2.447e-15
Looking at the P values:
The p-value for Middle East is 0.3532. This means a 35% chance this predictor is not meaningful. The p-value for EUROPE is 0.6685. This means a 67% chance this predictor is no meaningful. The p-value for AMERICAS is 3.49e-05. A very small value means that age is probably an excellent addition to the model. The p-value for Local is 0.8799. This means a 88% chance this predictor is no meaningful. The p-value for Weight 1 is 5.38e-08. A very small value means that age is probably an excellent addition to the model.
Americas and Weight 1 appear to be good variables for a prediction.
Lets check the residuals to see any trend.
plot(univariateModelWinter$residuals)
We can see a few curves in the residuals.
If we run a univariate Model again this time with the two predictors we get the following:
univariateModelWinter1 <- lm(formula = `Count of Sales` ~ (AMERICAS + `Weight 1`), data=trainNN)
summary(univariateModelWinter1)
Call:
lm(formula = `Count of Sales` ~ (AMERICAS + `Weight 1`), data = trainNN)
Residuals:
Min 1Q Median 3Q Max
-402.97 -134.12 36.91 133.27 371.38
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 524.25324 106.54849 4.920 2.70e-05 ***
AMERICAS 5.04731 0.89093 5.665 3.19e-06 ***
`Weight 1` 0.86345 0.06075 14.213 4.02e-15 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 192.8 on 31 degrees of freedom
Multiple R-squared: 0.9277, Adjusted R-squared: 0.923
F-statistic: 198.8 on 2 and 31 DF, p-value: < 2.2e-16
When comparing the ‘Multiples in R-squared’ from both calculations we see that there is not much difference between the two but the Adjusted R-squared value accounts for more focused prediction.
Lets check the residuals:
plot(univariateModelWinter1$residuals)
When checking the residuals there is a slight pattern.
Lets plot the prediction against the data to see how much we capture.
plot(y=trainNN$`Count of Sales`, x=trainNN$Calendar_Week)
abline(univariateModelWinter1)
only using the first two of 3 regression coefficients
The regression line doesnt appear on the graph. with error: “only using the first two of 3 regression coefficients”
Summer data (Month_DF 3) extracted and stored and displayed in a statistical form.
SummerDS <- datasetNN2 %>%
filter(Month == 3)
ggplot(SummerDS, aes(x = Calendar_Week, y = `Count of Sales`))+
geom_point()
Using correlation module to look for any correaltions in the Summer data.
corrplot(cor(SummerDS))
the standard deviation is zero
When looking at the data we see a high correlation with the calendar week and Weight 1, Female, Touist and APAC values. No or weak correaltion in other values and a weird negative correlation with Local value.
Correlation figures for the columns:
cor(SummerDS)
the standard deviation is zero
Calendar_Week Month Count of Sales Total Sales by Month Weight 1 Weight 2 M F Local Tourist
Calendar_Week 1.0000000 NA 0.79019739 0.85760374 0.94876130 0.18293156 0.4932981 0.84356122 -0.53584239 0.950308394
Month NA 1 NA NA NA NA NA NA NA NA
Count of Sales 0.7901974 NA 1.00000000 0.98576558 0.90981466 0.72121734 0.6131608 0.98149778 -0.02907363 0.892413333
Total Sales by Month 0.8576037 NA 0.98576558 1.00000000 0.95852079 0.60954700 0.5939882 0.98316300 -0.16768582 0.937865905
Weight 1 0.9487613 NA 0.90981466 0.95852079 1.00000000 0.36934489 0.5827722 0.93653500 -0.38722047 0.981682792
Weight 2 0.1829316 NA 0.72121734 0.60954700 0.36934489 1.00000000 0.3899763 0.63190531 0.56231999 0.361398612
M 0.4932981 NA 0.61316080 0.59398816 0.58277223 0.38997629 1.0000000 0.60337615 0.10130203 0.661747222
F 0.8435612 NA 0.98149778 0.98316300 0.93653500 0.63190531 0.6033761 1.00000000 -0.08401892 0.924021546
Local -0.5358424 NA -0.02907363 -0.16768582 -0.38722047 0.56231999 0.1013020 -0.08401892 1.00000000 -0.399008297
Tourist 0.9503084 NA 0.89241333 0.93786590 0.98168279 0.36139861 0.6617472 0.92402155 -0.39900830 1.000000000
APAC 0.9575104 NA 0.86669661 0.90818487 0.96130102 0.33494918 0.6727318 0.90052164 -0.40027421 0.991040749
MIDDLE EAST -0.2848623 NA 0.10170402 -0.02582763 -0.23448450 0.59184109 0.1937494 0.06880285 0.87565736 -0.218744371
EUROPE -0.2462147 NA 0.34836177 0.23361661 0.03936285 0.71518883 0.3300980 0.26755316 0.74582435 0.003230515
AMERICAS -0.3500960 NA -0.21672127 -0.18841352 -0.25567276 -0.04138769 -0.4363857 -0.22705346 0.07404444 -0.334413468
REST OF EMEIA -0.1868129 NA 0.12947249 0.04769788 -0.10778894 0.43864408 0.2491044 0.15381937 0.80236148 -0.112578203
APAC MIDDLE EAST EUROPE AMERICAS REST OF EMEIA
Calendar_Week 0.95751039 -0.28486225 -0.246214702 -0.35009597 -0.18681291
Month NA NA NA NA NA
Count of Sales 0.86669661 0.10170402 0.348361768 -0.21672127 0.12947249
Total Sales by Month 0.90818487 -0.02582763 0.233616609 -0.18841352 0.04769788
Weight 1 0.96130102 -0.23448450 0.039362851 -0.25567276 -0.10778894
Weight 2 0.33494918 0.59184109 0.715188834 -0.04138769 0.43864408
M 0.67273184 0.19374939 0.330098018 -0.43638570 0.24910438
F 0.90052164 0.06880285 0.267553158 -0.22705346 0.15381937
Local -0.40027421 0.87565736 0.745824347 0.07404444 0.80236148
Tourist 0.99104075 -0.21874437 0.003230515 -0.33441347 -0.11257820
APAC 1.00000000 -0.19167853 -0.041257422 -0.44563924 -0.12252271
MIDDLE EAST -0.19167853 1.00000000 0.483139408 -0.12468866 0.81224489
EUROPE -0.04125742 0.48313941 1.000000000 0.05884796 0.45935463
AMERICAS -0.44563924 -0.12468866 0.058847960 1.00000000 0.14499663
REST OF EMEIA -0.12252271 0.81224489 0.459354629 0.14499663 1.00000000
Create a univariate model with the following predictor variables: Target Variable: Count of Sales Predictor Variable: Female, Weight 1, Tourist, APAC
lm([target variable] ~ [predictor variables], data = [data source])
univariateModelSummer <- lm(formula = `Count of Sales` ~ `Weight 1`+F+Tourist+APAC, data=trainNN)
summary(univariateModelSummer)
Call:
lm(formula = `Count of Sales` ~ `Weight 1` + F + Tourist + APAC,
data = trainNN)
Residuals:
Min 1Q Median 3Q Max
-211.19 -77.87 -10.80 65.21 328.04
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 196.04098 82.66821 2.371 0.024577 *
`Weight 1` 0.56047 0.05907 9.488 2.15e-10 ***
F 1.60724 0.37337 4.305 0.000174 ***
Tourist 0.21750 0.53450 0.407 0.687049
APAC -0.34946 0.42444 -0.823 0.417030
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 120.9 on 29 degrees of freedom
Multiple R-squared: 0.9734, Adjusted R-squared: 0.9697
F-statistic: 265.1 on 4 and 29 DF, p-value: < 2.2e-16
Looking at the P values:
The p-value for Weight 1 is 2.15e-10. A very small value means that Weight 1 is probably an excellent addition to the model. The p-value for F is 0.000174. A very small value means that Female is probably an excellent addition to the model. The p-value for Tourist is 0.687049. This means a 69% chance this predictor is no meaningful. The p-value for APAC is 0.417030. This means a 42% chance this predictor is no meaningful.
Female and Weight 1 appear to be good variables for a prediction.
Lets check the residuals to see any trend.
plot(univariateModelSummer$residuals)
We can see a few curves in the residuals.
If we run a univariate Model again this time with the two predictors we get the following:
univariateModelSummer1 <- lm(formula = `Count of Sales` ~ (F + `Weight 1`), data=trainNN)
summary(univariateModelSummer1)
Call:
lm(formula = `Count of Sales` ~ (F + `Weight 1`), data = trainNN)
Residuals:
Min 1Q Median 3Q Max
-224.52 -76.18 -7.15 62.93 342.53
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 212.87488 79.23569 2.687 0.0115 *
F 1.53913 0.13343 11.535 9.52e-13 ***
`Weight 1` 0.57887 0.05102 11.346 1.44e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 119.5 on 31 degrees of freedom
Multiple R-squared: 0.9722, Adjusted R-squared: 0.9704
F-statistic: 541.7 on 2 and 31 DF, p-value: < 2.2e-16
When comparing the ‘Multiples in R-squared’ from both calculations we see that there is not much difference between the two but the Adjusted R-squared value accounts for more focused prediction.
Lets check the residuals:
plot(univariateModelSummer1$residuals)
When checking the residuals there is a slight pattern, quiet similar to the calculation with multiple variables.
Lets plot the prediction against the data to see how much we capture.
plot(y=trainNN$`Count of Sales`, x=trainNN$Calendar_Week)
abline(univariateModelSummer1)
only using the first two of 3 regression coefficients
The regression line doesnt appear on the graph. with error: “only using the first two of 3 regression coefficients”
Overall figures
Looking at the data, in Autumn and Winter you have the highest quantity of sales during this period with a noticable dip in quantity of sales in hotter seasons.
Winter
Looking at the main plot we can see figures rise in the december and january months. When looking for key factors, we found the following factors as good predictors by looking at the P-Values: AMERICAS WEIGHT 1
Summer
Looking at the main plot we can see figures rise in a slope when heading into Autumn. When looking for key factors, we found the following factors as good predictors by looking at the P-Values: FEMALE WEIGHT 1
So in conclusion based on P-values we can see that Heavy scarfs (Weight 1) is a good variable to predict quantity of sales in Winter and Summer. In Winter you can add AMERICAS value to improve the prediction and in Summer you can use FEMALE value to improve the prediction.